per chi vuole provare a simulare le cose in tempo reale
qr code che manda a questo link https://github.com/sitalaura/link-functions/tree/main/R
oppure scaricare il file a questo percorso sitalaura.github.io/link-functions/R/datasim.R
independent variable: age in years (years)
dependent variable: (variabile)
using the classical linear predictor
what we dont see it bc its a default parameter but its actually hidden in our code:
the model uses family gaussian and the identity link function
link function in GLMs transforms (re-map) the linear predictor X
to the appropriate range of the response variable Y
independent variable: age in years (years)
dependent variable: mistakes in a TRUE/FALSE task (accuracy)
using the classical linear predictor
i nuovi dati simulati dal modello vanno chiaramente fuori dal range (0,1) di possibili valori per l’accuratezza
IN THE FIRST EXAMPLE an identity link was appropriate bc
boh) spans from -inf to +infhere an identity link is NOT appropriate bc
accuracy) spans from 0 to 1in this case, link="logit" makes sure that y spans from 0 and 1
independent variable: age in years (years)
dependent variable: mistakes in a TRUE/FALSE task (accuracy)
adding a new main effect
groups: normal kids (group = 0)
kids with dyslexia (group = 1)
a positive interaction emerges
Call:
glm(formula = accuracy ~ age * group, data = d)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.945828 0.002492 379.52 <2e-16 ***
age 0.051628 0.002064 25.02 <2e-16 ***
group1 -0.087662 0.003601 -24.34 <2e-16 ***
age:group1 0.062356 0.002989 20.86 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for gaussian family taken to be 0.003234223)
Null deviance: 16.3694 on 999 degrees of freedom
Residual deviance: 3.2213 on 996 degrees of freedom
AIC: -2890.1
Number of Fisher Scoring iterations: 2
a negative interaction emerges
fit = glm(accuracy ~ age*group, data=d, family=binomial(link="logit"), weights= rep(k, nrow(d)))
summary(fit)
Call:
glm(formula = accuracy ~ age * group, family = binomial(link = "logit"),
data = d, weights = rep(k, nrow(d)))
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 4.20636 0.06846 61.444 < 2e-16 ***
age 1.62923 0.04597 35.441 < 2e-16 ***
group1 -1.71289 0.07531 -22.745 < 2e-16 ***
age:group1 -0.38975 0.05180 -7.524 5.3e-14 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 8909.4 on 999 degrees of freedom
Residual deviance: 1033.5 on 996 degrees of freedom
AIC: 3106.6
Number of Fisher Scoring iterations: 5
k = 50
N = 1000
group = rbinom(N,1,.5)
age = runif(N,6,10)
eta = -6+1*age-1*group
probs = mafc.probit(.m = 2)$linkinv(eta)
accuracy = rbinom(n = N, size = k, prob = probs) / k
d = data.frame(age=age-mean(age),accuracy,group=as.factor(group))
ggplot(d,aes(x=age,y=accuracy,color=group))+
geom_point()+
xlab("Age-centered")non ho simulato un’interazione, quindi ENTRAMBI i modelli trovano un’interazione che non c’è.
let’s try out the multiple alternative forced choice (50% - bc of the true/false) probit link
no interaction emerges !!!! as it should
fit = glm(accuracy ~ age*group, data=d, family=binomial(link=mafc.probit(.m=2)), weights= rep(k, nrow(d)))
summary(fit)
Call:
glm(formula = accuracy ~ age * group, family = binomial(link = mafc.probit(.m = 2)),
data = d, weights = rep(k, nrow(d)))
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.117434 0.042607 49.697 <2e-16 ***
age 1.024366 0.032983 31.058 <2e-16 ***
group1 -1.025316 0.047097 -21.770 <2e-16 ***
age:group1 0.002484 0.039268 0.063 0.95
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 8831.50 on 999 degrees of freedom
Residual deviance: 805.19 on 996 degrees of freedom
AIC: 2767.5
Number of Fisher Scoring iterations: 6
equal intervals on X correspond to equal intervals on Y
su x ed y metti i nomi delle variabili dell’esempio
equal intervals on X correspond to equal ratios (NOT equal intervals) on Y
Building a model means that we want to find the processo generativo dei dati which, diversamente dal mondo delle simulazioni, we could never know for sure
to do that we must make important decisions
choosing the more appropriate family of distributions to make sure that the new values of the vd im predicting lie within the bounds
choosing the more appropriate link function: otherwise it’s very likely you end up finding non linear effects (ie interactions) that are not there!
We’re conducting a systematic review concerning how often the wrong link functions are used in psychological research + they lead to finding a significant interaction: so far, quite often
All materials are available on GitHub at sitalaura/link-functions
Questions and feedbacks laura.sita@studenti.unipd.it
Domingue, B. W., Kanopka, K., Trejo, S., Rhemtulla, M., & Tucker-Drob, E. M. (2024). Ubiquitous bias and false discovery due to model misspecification in analysis of statistical interactions: The role of the outcome’s distribution and metric properties. Psychological methods, 29(6), 1164.
Hardwicke, T. E., Thibault, R. T., Clarke, B., Moodie, N., Crüwell, S., Schiavone, S. R., Handcock, S. A., Nghiem, K. A., Mody, F., Eerola, T., et al. (2024). Prevalence of transparent research practices in psychology: A cross-sectional study of empirical articles published in 2022. Advances in Methods and Practices in Psychological Science, 7 (4), 25152459241283477.
Liddell, T. M., & Kruschke, J. K. (2018). Analyzing ordinal data with metric models: What could possibly go wrong?. Journal of Experimental Social Psychology, 79, 328-348.
Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures. Psychological bulletin, 105(1), 156.
Special thanks to
a negative interaction emerges
fit = glm(accuracy ~ age*group, data=d, family=binomial(link="probit"), weights= rep(k, nrow(d)))
summary(fit)
Call:
glm(formula = accuracy ~ age * group, family = binomial(link = "probit"),
data = d, weights = rep(k, nrow(d)))
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.21133 0.03018 73.280 < 2e-16 ***
age 0.81113 0.02295 35.337 < 2e-16 ***
group1 -0.79152 0.03400 -23.279 < 2e-16 ***
age:group1 -0.11299 0.02637 -4.285 1.83e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 8812.26 on 999 degrees of freedom
Residual deviance: 853.32 on 996 degrees of freedom
AIC: 2928.7
Number of Fisher Scoring iterations: 6
Cognitive Science Arena 2026